Previously, I have shown how to use -margins- after -ml-. I motivated the excercise, as an example for the estimation of nonlinear models.
The model, however, was not itself non-linear. It was, in fact, a linear regression model estimated using mle. This was done by impossing the assumption of normality on the errors of the model.
On the bright side, the outcome of interest: the standard deviation of the error and the outcome in actual scale (not its log) was nonlinear.
This time, I'll be making a similar excercise, using a more common nonlinear model: the probit model.
As in the previous example, I'll walk you through the setting up the ml program and the prediction program.
First thing first. Unless you already have this program saved somewhere in your accesible ado files (most likely the "ado/personal" folder), make sure to have the following program in memory.
. program adde, eclass 1. ereturn `0' 2. end
So, lets start.
The probit model is a nonlinear model that can be used when your dependent variable is a binary variable (0 - 1), and when your intention is to model what is the "probability of success (y=1)" given a set of characteristics. \( P(y=1|X) \)
From the technical point of view, you can think about the probit model in two ways.
$$ E(y=1|X) = P(y=1|X) = \Phi (X\beta) $$
$$ y^* = X\beta + e_i $$ $$ y^*>0 \rightarrow y=1$$ $$ y^*<=0 \rightarrow y=0$$ $$ y^* = X\beta + e_i $$
There is of course the added constraint that, for this to be a probit model you assume \( e_i \sim N(0,1) \) follows a standard normal distribution. Under this assumption, determining the probability of success \( P(y=1|X) \) is equivalent to determining the probability that \( P(y^*>0|x) \).
Using either approach, the likelihood of a single observation will be equal to (this will be important for the MLE): $$ L_i = \Phi(X\beta) \quad if y = 1 (success) $$ $$ L_i = 1-\Phi(X\beta) \quad if y = 0 (failure) $$ One must remember that we will be using the LOG of this expression for the MLE program. And of course, we also have the outcome of interest: $$ P(y=1|X) = \Phi(X\beta) $$ where, again, the \( \Phi \) is the cumulative distribution function (CDF) for a standard normal.
First things first, we need to write the LogLikelihood function for the probit model.
. ***/ . program myprobit 1. args lnf xb 2. local p1 normal(`xb') 3. local p0 normal(-`xb') 4. local y $ML_y1 5. qui:replace `lnf' = log(`p1') if `y'==1 6. qui:replace `lnf' = log(`p0') if `y'==0 7. end
Notice that this program has only 2 arguments. The variable that will store the LogLikelihood "lnf", and the variable that will capture the observed component of the latent index. "xb".
To make this text easier to read, I use some locals to identify different components for my LL.
Now just need to write the "predict" program:
. program myprobit_p 1. syntax newvarname [if] [in] , [ pr odds *] 2. if "`pr'`odds'" =="" { 3. ml_p `0' 4. } 5. marksample touse, novarlist 6. if "`pr'" !="" { 7. tempvar xb 8. _predict double `xb' , eq(#1) 9. gen `typlist' `varlist' = normal(`xb') if `touse' 10. label var `varlist' "P(y=1|X)" 11. } 12. else if "`odds'" !="" { 13. tempvar xb 14. _predict double `xb' , eq(#1) 15. gen `typlist' `varlist' = normal(`xb')/normal(-`xb') if `touse' 16. label var `varlist' "P(y=1|X)/P(y=0|X)" 17. } 18. end
This program,-myprobit_p-, will have two options.
All right. With all this information in hand, we can now estimate our probit model.
To keep things reproducible, I will use a dataset from the Stata help file:
. webuse union, clear
(NLS Women 14-24 in 1968)
Now lets estimate the model using -ml-. Here we will indicate the "method" (lf) and the program that defines the Log likelihood.
. ml model lf myprobit (union = age grade not_smsa south##c.year) , maximize initial: log likelihood = -18160.456 alternative: log likelihood = -14355.672 rescale: log likelihood = -14220.454 Iteration 0: log likelihood = -14220.454 Iteration 1: log likelihood = -13547.574 Iteration 2: log likelihood = -13544.385 Iteration 3: log likelihood = -13544.385 . ml display Number of obs = 26,200 Wald chi2(6) = 618.81 Log likelihood = -13544.385 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ union | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0118481 .0029072 4.08 0.000 .0061502 .017546 grade | .0267365 .0036689 7.29 0.000 .0195457 .0339273 not_smsa | -.1293525 .0202595 -6.38 0.000 -.1690604 -.0896445 1.south | -.8281078 .2472219 -3.35 0.001 -1.312654 -.3435619 year | -.0080931 .0033469 -2.42 0.016 -.0146529 -.0015333 | south#c.year | 1 | .005737 .0030917 1.86 0.064 -.0003226 .0117965 | _cons | -.6542487 .2007777 -3.26 0.001 -1.047766 -.2607316 ------------------------------------------------------------------------------
You can compare the results with the standard -probit-.
Next, we need to "add" our predction program to e().
. adde local predict myprobit_p
And thats it. We can now estimate the marginal effects for the probit, assuming our outcome of interest is
. margins, dydx(*) predict(pr) Average marginal effects Number of obs = 26,200 Model VCE : OIM Expression : P(y=1|X), predict(pr) dy/dx w.r.t. : age grade not_smsa 1.south year ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .003442 .000844 4.08 0.000 .0017878 .0050963 grade | .0077673 .0010639 7.30 0.000 .0056822 .0098525 not_smsa | -.0375788 .0058753 -6.40 0.000 -.0490941 -.0260634 1.south | -.1054928 .0050851 -20.75 0.000 -.1154594 -.0955261 year | -.0017906 .0009195 -1.95 0.051 -.0035928 .0000115 ------------------------------------------------------------------------------ Note: dy/dx for factor levels is the discrete change from the base level. . margins, dydx(*) predict(odds) Average marginal effects Number of obs = 26,200 Model VCE : OIM Expression : P(y=1|X)/P(y=0|X), predict(odds) dy/dx w.r.t. : age grade not_smsa 1.south year ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0059585 .001467 4.06 0.000 .0030832 .0088338 grade | .013446 .0018647 7.21 0.000 .0097913 .0171007 not_smsa | -.0650526 .0102657 -6.34 0.000 -.085173 -.0449322 1.south | -.1720679 .0084276 -20.42 0.000 -.1885856 -.1555501 year | -.0032757 .001598 -2.05 0.040 -.0064076 -.0001438 ------------------------------------------------------------------------------ Note: dy/dx for factor levels is the discrete change from the base level. . margins south, predict(odds) Predictive margins Number of obs = 26,200 Model VCE : OIM Expression : P(y=1|X)/P(y=0|X), predict(odds) ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- south | 0 | .3626616 .0066416 54.60 0.000 .3496442 .375679 1 | .1905937 .0051289 37.16 0.000 .1805412 .2006462 ------------------------------------------------------------------------------
So how do we interpret the results?. Here one possibility:
People Living in the sourth are, in average, 10% less likely to belong to a union.
or, Living in the south reduces the odds of belonging to a union in 0.172 points (from 0.3627 to 0.1906).
Thanks for reading.